Course 7001 Mini Project Performance Evaluation of Hadoop on Virtual Machines
نویسنده
چکیده
MapReduce[1] is a popular programming framework that is intended for automatical paralellization of computation in the cloud. MapReduce deals with data intensive applications; huge amount of data is first loaded from remote DFS, then copied as intermediate results from Mapper to Reducer, and finally written back to DFS. Along with this large amount of data transfer, many I/O operations are incurred across MapReduce instances.
منابع مشابه
Efficient and Parallel Data Processing and Resource Allocation in the Cloud by using Nephele’s Data Processing Framework
Cloud computing is a technology in which the Cloud Service Providers (CSP) provide many virtual servers to the users to store their information in the cloud. The faults occurring on the assignment and dismission of the virtual machines, the processing cost in the allocation of resources must also be considered. The parallel processing of the information on the virtual machines must be done effe...
متن کاملPerformance and energy efficiency of big data applications in cloud environments: A Hadoop case study
The exponential growth of scientific and business data has resulted in the evolution of the cloud computing environments and the MapReduce parallel programming model. The focus of cloud computing is increased utilization and power savings through consolidation while MapReduce enables large scale data analysis. Hadoop, an open source implementation of MapReduce has gained popularity in the last ...
متن کاملMorpho: A decoupled MapReduce framework for elastic cloud computing
MapReduce as a service enjoyswide adoption in commercial clouds today [3,23]. Butmost cloud providers just deploy native Hadoop [24] systems on their cloud platforms to provide MapReduce services without any adaptation to these virtualized environments [6,25]. In cloud environments, the basic executing units of data processing are virtual machines. Each user’s virtual cluster needs to deploy HD...
متن کاملAnalytical evaluation of an innovative decision-making algorithm for VM live migration
In order to achieve the virtual machines live migration, the two "pre-copy" and "post-copy" strategies are presented. Each of these strategies, depending on the operating conditions of the machine, may perform better than the other. In this article, a new algorithm is presented that automatically decides how the virtual machine live migration takes place. In this approach, the virtual machine m...
متن کاملInvestigation of Storage Options for Scientific Computing on Grid and Cloud Facilities
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies ...
متن کامل